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ABSTRACT 

Searches for unknown physics and decisions between competing astrophysical mod¬ 
els to explain data both rely on statistical hypothesis testing. The usual approach in 
searches for new physical phenomena is based on the statistical Likelihood Ratio Test 
(LRT) and its asymptotic properties. In the common situation, when neither of the 
two models under comparison is a special case of the other i.e., when the hypotheses 
are non-nested, this test is not applicable. In astrophysics, this problem occurs when 
two models that reside in different parameter spaces are to be compared. An important 
example is the recently reported excess emission in astrophysical y-rays and the ques¬ 
tion whether its origin is known astrophysics or dark matter. We develop and study a 
new, simple, generally applicable, frequentist method and validate its statistical prop¬ 
erties using a suite of simulations studies. We exemplify it on realistic simulated data 
of the Fermi-LAT y-ray satellite, where non-nested hypotheses testing appears in the 
search for particle dark matter. 
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1 MODEL COMPARISON IN 
ASTROPARTICLE PHYSICS 

In astrophysics, hypothesis testing is ubiquitous, because 
progress is made by comparing competing models to ex¬ 
perimental data. In the special case, where new physical 
phenomena are searched for, the most common choice of 
hypothesis test is the Likelihood Ratio Test (LRT), whose 
popularity is partly motivated by the fact that, assuming Ho 
is true, the asymptotic distribution of the LRT statistic is a 
. Such result holds if the regularity conditions specified in 
Wilks’s theorem hold (Wilks 1938). A key necessary condi¬ 
tion is “nested-ness”, meaning that there is a full model of 
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which both the models under Hq and the alternative hypoth¬ 
esis, Hi, are special cases. This is obviously the case for the 
search for new particles where the null hypothesis (or base¬ 
line model), Ho, is given by “background” and Hi is given by 
“background-bsignal of new particle”. However, cases where 
model comparison is non-nested are common: for instance, 
when a known astrophysical signal can be confused with new 
physics, see Ackermann et al. (2012) for an example from as¬ 
troparticle physics, or if the models to be compared reside 
in different parameter spaces (Profumo & Linden 2012); as 
in gamma-ray bursts (Guiriec et al. 2015). In these situa¬ 
tions, Monte Carlo simulations of the measurement process 
are often the only possibility, but are challenged by stringent 
significance requirements, e.g., at the 5a level. We present 
a solution that allows evaluation of accurate statistical sig¬ 
nificances for non-nested model comparison while avoiding 
extensive Monte Carlo simulations. As a concrete example, 
we apply the proposed procedure to the search for particle 
dark matter, where the method has particular importance. 

One way to search for dark matter is to consider its 
hypothesized annihilation products, i.e., 7 -rays, that can be 
detected by space borne or ground based 7 -rays telescopes 
(Conrad, Cohen-Tanugi & Stigari 2015). Here, the issue of 
source confusion is one of the most challenging aspects of 
claiming discovery of a dark matter induced signal. A de- 
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tected excess of 7 -rays may either originate from dark mat¬ 
ter annihilation or be caused by conventional, known as- 
trophysical sources. Discrimination can be performed using 
their spectral distributions, however these are not necessar¬ 
ily part of the same parameter space (see below). This situa¬ 
tion arises for example in the search for dark matter sources 
among the unidentified sources found by Fermi-LAT (Ack- 
ermann et al. 2012 ), the claimed detection of a signal consis¬ 
tent with dark matter in our own galaxy, which has gained 
much attention recently (Daylan et al. 2014), or (once a 
detection has been made) in the search for dark matter in 
dwarf galaxies (Ackermann et al. 2011, 2014, 2015; Geringer- 
Sameth & Koushiappas 2011; Geringer-Sameth et al. 2015). 
In the recent claims, the existence of a source of 7 -rays (over 
some background) is established by a LRT, but the crucial 
and unsolved question is not whether a 7 -ray source exists, 
but whether it can be explained by conventional sources of 
7 -rays as opposed to dark matter annihilation. This is a 
prime example of an non-nested model comparison. For def¬ 
initeness, we can assume f{y,Eo,4‘) « is the 

probability density function (pdf) of the 7 -rays energies, 
denoted by y, originating from known cosmic sources and 
g{y,M^) oc 0.73(j^) exp{—7.8j^} is the pdf of the 
7 -ray energies of dark matter (Bergstrom, Ullio & Buckley 
1998). The goal is to decide if f{y, Eq, (j>) is sufficient to ex¬ 
plain the data {Hq) or if g{y,M^) (Hi) provides a better 
fit. 


Although the issue of comparing non-nested models has 
been addressed since the early days of modern statistics (Gox 
1961, 1962, 2013; Atkinson 1970; Quandt 1974), as well as in 
the more recent physical literature (Pilla, Loader & Taylor 
2005; Pilla & Loader 2006), a method with the desired sta¬ 
tistical properties, easy implementation and computational 
efficiency in astrophysics is still lacking. 

This article is arranged as follows. Section 2 reviews the 
LRT, Wilks’s theorem and their extensions to non-regular 
situations. Our proposal for testing non-tested models is in¬ 
troduced in Section 3, validated via simulation studies in 
Section 4, and applied to a realistic simulation of the Fermi- 
LAT 7 -ray satellite in Section 5. General discussion appears 
in Section 6 . 


2 WILKS, CHERNOFF AND TRIAL FACTORS 


Let f{y, a) and g{y, /3) be pdfs of the background and signal, 
where y is the detected energy, a and j3 are parameters. 
Suppose observed particles are a mixture of background and 
source, i.e., 

(1-»7)/(2/,a) + w(j/,/3) (1) 


where 0 ^ 77 ^ 1 is the proportion of signal counts. A 
hypothesis test can be specified as Hq : g = rjo versus 
Hi : g > go, and if /3 is known the LRT statistic by 


Tip) 


-2 log 


Ljgo, ap,-) 
L{gi,ai,P)' 


( 2 ) 


where L{g, a, /?) is the likelihood function under (1). The nu¬ 
merator and denominator of ( 2 ) are the maximum likelihood 
achievable under Ho and Hi, respectively with ap being the 
MLE of a under Hg and Ai and gi the MLEs under Hi. 
(Wilks 1938) states that when Ho is true and when testing 


for a one-dimensional parameter (in this case g), T{P) is 
asymptotically distributed as a Xi (fh® subscript being the 
degrees of freedom). Among the regularity conditions which 
guarantee this result are: 

RGl. The models are nested, meaning that there is a full 
model of which both Hq and Hi are special cases. 

RG2. The set of possible parameters of Hp is on the inte¬ 
rior of that for the full model. 

RG3. The full model is identifiable under Hq. 

Unfortunately in practice, it is common to encounter non¬ 
regular problems. Notice for example, if P is known but go = 
0, RC2 does not hold. In this case, Ghernoff (1954) applies; 
it generalizes Wilks and states that if Ho is on the boundary 
of the parameter space, the asymptotic distribution of T(/3) 
is an equal mixture of a Xi and a Dirac delta function at 0, 
namely Ixi + |^( 0 )- 

Further, if ryo = 0 (on the boundary) and P is un¬ 
known, the model in ( 1 ) is not identifiable under Ho and 
RG3 fails. This is known in statistics as a test of hypoth¬ 
esis where a nuisance parameter is defined only under Hi, 
or “trial correction” in astrophysical literature. A solution 
based on theoretical result of Davies (1987) is proposed by 
Gross & Vitells (2010). In particular, under Ho, T(/3) is a 
random process indexed by P, specifically if RC2 (but not 
RG3) holds {TiP),P £ B} is asymptotically a Xi-process. 
A natural choice of test statistic is sup^r(/3) and Gross 
& Vitells (2010) provides an approximation in the limit as 
c —>■ 00 for the tail probability P(sup^T(/l) > c). Finally, 
if both RG2 and RG3 fail to hold (e.g., the important case 
of ryo = 0 with P unknown), we show in our Supplemen¬ 
tary Material that because {TiP),P £ B} is a |xi + |'5(0) 
random process, 

P(supT(/I) > c) « -b E[N{co)\Ho]e-'^ (3) 

P 2 

where E[N{co)\Ho] is the expected number of upcrossings 
of the T{P) process over the threshold cq under Ho and co 
is chosen co << c. (Details of how to choose co are given 
in Gross & Vitells (2010), where (3) is also asserted, but 
without proof.) Although this approximation holds as c —> 
00 , when c is small, the right hand side of (3) is an upper 
bound for P(sup^r(/3) > c). Thus, basing inference on (3) 
is valid, though perhaps conservative. 


3 STATISTICAL COMPARISON OF 
NON-NESTED MODELS 

Suppose we wish to compare two pdfs, fiy,a) and giy,P), 
for which RGl does not apply, that is the two pdfs are not 
special cases of a full model and do not share a parameter 
space. Notice that in both / and g free parameters (i.e., a 
and P respectively) are present and thus, the problem cannot 
be reduced to a test for simple hypotheses as in Cousins 
(2005), see Cox (1961) for more details. We require P to be 
one dimensional and a to lie in the interior of its parameter 
space. The goal is to develop a test of the hypothesis: 

Ho ■■ fiy, ol) versus Hi : g{y, P) (4) 

Although fiy, a.) and giy, P) are non-nested, we can 
construct a comprehensive model which includes both as 
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special cases. There are two reasonable formulations. We 
encountered the first in ( 1 ); the second is proportional to 
{f{y,cx)}^~'^{g(y,l3)}^, with 0 r; ^ 1 in both formula¬ 
tions. As discussed in Cox (1962, 2013); Atkinson (1970) 
and Quandt (1974), there are advantages and disadvantages 
to both. From our perspective, the additive form in (1) has 
the advantage of more appealing mathematical properties. 
Since no normalizing constant is involved, the maximization 
of the log-likelihood reduces to numerical optimization. In 
contrast to the test discussed in Section 2, the model in (1) 
is not viewed as a mixture of astrophysical models in which 
a certain proportion of events, y, originates a process rep¬ 
resented by one model, and the the remaining proportion, 
l — y, originates from the completing process represented by 
the other model. Instead, (1) is a mathematical formaliza¬ 
tion used to embed the pdfs f{y,a) and g{y,P) and their 
corresponding parameters spaces into an overarching model 
via the auxiliary parameter y (Quandt 1974). The overarch¬ 
ing model has not astrophysical interpretation, but helps us 
reformulate the test in (4) into a suitable form, i.e.. 

Ho : y — 0 versus Hi : y > 0. (5) 

Perhaps a more natural formulation of (4) would be 
Hq : y — 0 versus Hi : jy = 1. Unfortunately, neither Wilks’s 
or Chernoff’s theorems apply to this formulation since they 
rely on the asymptotic normality of the MLE under Hq, 
which can only hold if there is a continuum of possible val¬ 
ues of y under Hi, with 77 = 0 in its interior. With indirect 
dark matter detection, the formulation in (5) allows the al¬ 
ternative model to include both the case where dark mat¬ 
ter and known cosmic sources are present simoultaneously 
(0 < r; < 1 ) and the case where only dark matter is present 
(77 = 1). In situations where intermediate values of 77 are not 
physical we might, in addition to (5), test Hq : 77 = 1 ver¬ 
sus Hi : 77 < 1, i.e., interchange the roles of the hypotheses 
as discussed in Cox (1962, 2013). In this case, the nuisance 
parameter a is required to be one dimensional i.e., a = a. 

Under model (1), testing (5) is equivalent to testing 77 
on the boundary with (3 only being defined under the alter¬ 
native. We can apply the methods discussed in Section 2 to 
solve this problem. Notice that such methods can still be 
applied if the two models share additional parameters, 7 , 
i.e., /(j/, 7 , 0 :) and g{jj,~f,j3). However, the maximized like¬ 
lihoods in ( 2 ) must be replaced by their profile counterparts 
L(0,7o,oo) and I/( 77 i, 71 , oi, /3) (Davison 2003). 


4 VALIDATION ON DARK MATTER MODELS 

We illustrate the reliability of the method proposed for test¬ 
ing non-nested models using two sets of Monte Carlo sim¬ 
ulations. In Test 1, we compare the two models introduced 
in Section 1 with the aim of distinguishing between a dark 
matter signal and a power law distributed cosmic source. In 
Test 2, we make the same comparison but in the presence of 
power law distributed background. In this case, Hq specifies 
as 

( 6 ) 

and Hi specifies 

_7 Q y 

SE^ e 

g{y,5,X,Eo,M^) = {l-X)—^+\^- -; (7) 




Figure 1. Comparing the approximation in (3) (solid blue lines) 
with Monte Carlo estimation of P(supT(M^) > c) (gray dashed 
lines), for Test 1 (upper panel) and Test 2 (lower panel). Ap¬ 
proximations correspondig to (3) without the Chernoff correction 
(blue dashed lines), a approximation (light blue dash-dotted 
lines) and a Chernoff-adjusted approximation (light blue dot¬ 
ted lines) are also reported. Monte Carlo p-values were obtained 
by simulating 10,000 datasets under Hq, each of size 10,000 for 
both simulations. For each simulated dataset sup^^ T{M^) was 
computed over an grid of size 100 for Test 1 and size 400 for 
Test 2. Monte Carlo errors (gray areas) were attained via error 
propagation (Cowan 1998). 


where ks and are the normalizing constants for each 
pdf, 0 < A < 1, 5 > 0, 0 > 0, -Eo = 1, y C [Eo, 100] and 

€ [£^ 0 , 100 ]. Note that in this case, the formulation in 
(1), with mixture parameter A, is first used to specify the 
signal existence over a (relatively well known) background, 
whilist in the next step, equation ( 1 ) is adopted as a merely 
mathematical tool to treat the non-nested case (as described 
previously). 

For simplicity, in Test 2, A, the proportion of events 
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Figure 2. Simulated type I errors (upper panel) and power func¬ 
tions (lower panel) for Test 1 with at Scr significance. Shaded 
areas indicate regions expected to contain 68% (dark gray) and 
95% (light gray) of the symbols if the nominal type I error of 
0.003 holds. For both the type I error and power curves 10000 
Monte Carlo simulations were used. 


coming from dark matter, was fixed to 0.2. In both tests, we 
estimated the average number of uprcrossings E[N{cq)\Ho] 
using 1,000 Monte Carlo simulations. Finally, the approx¬ 
imation to P(sup^r(/3) > c) is calculated using (3) on a 
grid of values of c. The results are compared with the re¬ 
spective Monte Carlo p-values in Figure 1 along with the 
and Chernoff corrections one might compute ignoring the 
regularity conditions in Section 2. 

For small c, the approximation in (3) is greater than 
its Monte Carlo counterpart. As c increases, however, the 
approximation converges to the Monte Carlo estimates for a 
good approximation to the p-value, P(sup^r(/?) > c). The 
and respective Chernoff-adjusted approximation lead to 
over optimistic p-values, whereas similar results to those at¬ 
tained with (3) are achieved when the factor of 2 that ac¬ 
counts for RCl is omitted. This is not surprising since the 


right hand side of (3), is dominated by E[N{co)\Ho] (which 
also explains the wide discrepancy between (3) and the x^ 
approximations in Figure 1) and in practice, when testing on 
the boundary of the parameter space, E[N{cq)\Ho] is typ¬ 
ically calculated simulating a |xi + |^( 0 ) random process 
directly. Thus, the Chernoff correction is automatically im¬ 
plemented in the leading term of (3). 

It is not uncommon in practice, e.g. in astronomy, for 
the number of counts to be considerably smaller than the 
10,000 used in Figure 1. Thus, we conduct a simulation study 
to verify the type I error (i.e., the rate of false rejections of 
Ho) of the method with smaller samples and verify that the 
approximate p-value in (3) holds. The upper panel of Fig¬ 
ure 2 reports the simulated type I errors with a detection 
threshold on the p-value of 0.003 (3cr) for different sample 
sizes when conducting Test 1. For sample sizes of at least 
100, the Monte Carlo results are consistent with the numer¬ 
ical 3 ct error rate. The lower panel of Figure 2 shows the 
power (probability of detection) curves at 3a of the same 
test for different sample sizes. For all the values of con¬ 
sidered, a sample size of 500 is sufficient to achieve a power 
of nearly 1 . 


5 APPLICATION TO SIMULATED DATA 
FROM THE FERMI-LAT 

The Fermi Large Area Telescope (LAT) (Atwood 2009) is a 
pair-conversion 7 -ray telescope on board the earth-orbiting 
Fermi satellite. It measures energies and images 7 -rays be¬ 
tween about a 100 MeV and several TeV. One particular 
aspect is the 7 -ray signal induced by dark matter annihi¬ 
lations, which gives rise to measurable signal from celestial 
objects, like the Milky Way center or dwarf galaxies. Here 
we apply the method proposed in this letter to a dataset 
simulated with realistic representations of the effects of the 
detector and present backgrounds. We considered a 5 years 
observation of putative dark matter source (dwarf galaxy¬ 
like) with dark matter annihilating into b-quark pairs and a 
mass of the dark matter particle of 35 GeV. This assumption 
is consistent with the most generic and popular models for 
dark matter, namely that it is in large part made of a Weakly 
Interacting Massive Particles (WIMP). It is also consistent 
with recent claims of evidence for dark matter. The signal 
normalization corresponds to about 200 events detected in 
the LAT. Roughly, this corresponds to a dark matter source 
at the distance of the dwarf galaxy Seguel (and with compa¬ 
rable dark matter density) and an annihilation cross-section 
of ~ 2-10“^®cm^s“^). We find a 4.198 ct significance in favor 
of the dark matter model. Scaling the event rate down to 
50 (i.e. considering a lower cross-section by a factor of 4 or 
lower density by a factor of 16) we obtain 2.984cr signifi¬ 
cance (result not shown). Adding complexity, we introduce 
a background, for example 7 -rays introduced by our own 
Galaxy. We then considered 2176 counts from a power-law 
distributed background source as in (6)-(7) and about 550 
dark matter events. For simplicity, the mixture parameter 
A is fixed at 0.2. In this case, we find 2.9a significance in 
favor of the model in (7). As expected, introducing back¬ 
ground significantly reduces the power for distinguishing a 
dark matter source from a conventional source. It should be 
noted however that (unlike in a full analysis) we do not at- 
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Ho 

N 

V 


sup LRT 

Sig. 

Test 1 

7 = 0 

200 

0.971 

27 

21.018 

4.038(7 


7 = 1 

200 

p-value = 0.528 

Test 2 

rj = 0 

2726 

0.999 1 30 1 12.096 1 2.673(7 


7 = 1 

2726 

p-value = 1 


Table 1. Summary of the analysis on the Fermi LAT simulation 
comparing the models in Tests 1 and 2. Estimates and Signifi¬ 
cances refer to the tests Hq : r] = 0 versus Hi ■. rj > 0. P-values 
refer to the tests Hq : r} = I versus Hi : 77 < 1 . 


tempt to reduce background by taking 7 -ray directions into 
account. 


6 SUMMARY & DISCUSSION 

We have presented a two-step solution to a common prob¬ 
lem in experimental astrophysics: comparing competing non¬ 
nested models. On the basis of the seminal work of Cox 
(1962, 2013) and Atkinson (1970) the first step of our strat¬ 
egy requires the specification of a comprehensive model 
which extends the parameter space of the models to be com¬ 
pared. The problem of testing non-nested model is then re¬ 
duced to the look-elsewhere effect, and thus the second step 
naturally recalls Gross & Vitells (2010) as an efficient solu¬ 
tion to accurately approximate the significance of new sig¬ 
nals. The resulting procedure is easy to implement, does not 
require extensive calculations on a case-by-case basis and 
is computationally more efficient than Monte Carlo simula¬ 
tions. Recent developments (Algeri et al. 2016) in the nested 
case illustrate additional desirable statistical properties of 
Gross & Vitells (2010) with respect to Pilla, Loader & Tay¬ 
lor (2005) and Pilla & Loader (2006). Given the nature of 
the methodology proposed in this letter, we expect these 
finding to carry over to the non-nested case. 

An example of testing non-nested models arises in the 
search for particle dark matter. We use this example to vali¬ 
date and illustrate the procedure. We also demonstrate good 
performance in a realistic simulation of data that is collected 
with the Fermi-LAT 7 -ray detector and used in the search 
for signal from dark matter annihilation. 

Although any pair of hypothesized models can be ex¬ 
pressed as a special case of ( 1 ), this formulation alone does 
not always provide a mechanism for a statistical hypothesis 
test. In the non-nested scenario analysed in this letter, valid 
inference for the test in (5) can be achieved by applying the 
methodology in Gross & Vitells (2010). This method, how¬ 
ever, cannot handle multi-dimensional parameters that are 
defined only under Hi, nor can it deal with nuisance param¬ 
eters under Ho which lie on the boundary of their parameter 
space. A possible approach to tackle the first limitation is to 
apply the theory in Vitells & Gross (2011) to the compre¬ 
hensive model in (1). Whereas an extention of the method 
to overcome the second limitation could rely on the the¬ 
ory in Self & Liang (1987). In light of this, the methodology 
proposed is particularly suited to comparisons of non-nested 
models where these limitations often do not arise. 

Software for the methodology illuatsrated in this let¬ 
ter is available at: http: //wwwf . imperial. ac .uk/"'sa2514/ 
Research. html. 
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